Six-year trajectories and associated factors of positive and negative symptoms in schizophrenia patients, siblings, and controls: Genetic Risk and Outcome of Psychosis (GROUP) study

Positive and negative symptoms are prominent but heterogeneous characteristics of schizophrenia spectrum disorder (SSD). Within the framework of the Genetic Risk and Outcome of Psychosis (GROUP) longitudinal cohort study, we aimed to distinguish and identify the genetic and non-genetics predictors of homogenous subgroups of the long-term course of positive and negative symptoms in SSD patients (n = 1119) and their unaffected siblings (n = 1059) in comparison to controls (n = 586). Data were collected at baseline, and after 3- and 6-year follow-ups. Group-based trajectory modeling was applied to identify latent subgroups using positive and negative symptoms or schizotypy scores. A multinomial random-effects logistic regression model was used to identify predictors of latent subgroups. Patients had decreasing, increasing, and relapsing symptoms course. Unaffected siblings and healthy controls had three to four subgroups characterized by stable, decreasing, or increasing schizotypy. PRSSCZ did not predict the latent subgroups. Baseline symptoms severity in patients, premorbid adjustment, depressive symptoms, and quality of life in siblings predicted long-term trajectories while were nonsignificant in controls. In conclusion, up to four homogenous latent subgroups of symptom course can be distinguished within patients, siblings, and controls, while non-genetic factors are the main factors associated with the latent subgroups.


Results
Baseline characteristics of participants. Three-fourths of patients were male (p < 0.001) and patients were younger than controls (p < 0.001). Participants significantly differed in PRS SCZ and sociodemographic, functional, and clinical characteristics at baseline ( Table 1). The mean PRS SCZ at P T of 0.05, 0.1, and 0.5 of patients was significantly higher than that of siblings and controls (p < 0.001) and the mean PRS SCZ of siblings was significantly higher than that of controls (p < 0.001). In general, characteristics of unaffected siblings laid between patients and controls.
Trajectories of positive and negative symptoms. As Fig. 1 shows, multiple clinically and statistically meaningful trajectories of positive and negative symptoms were identified across all samples. In 1136 eligible patients, wedistinguished three trajectories of positive symptoms that were labeled as low, moderate, and high and three trajectories of negative symptoms, labeled as low, high-decreased, and high-increased. In sensitivity analysis, the core negative symptoms trajectories (data not shown) were highly similar to the original in terms of shapes and frequencies. Chi-square showed very low probability that these distributions are independent (Chisquared = 1353.3, df = 4, p-value < 2.2e−16) as the vast majority of patients were allocated in the same trajectory shape in both analyses. Only 9% of patients were allocated to different groups in these two analyses (over the total PANSS score and over the core symptoms), while 3% got allocated to the opposite group. www.nature.com/scientificreports/ In 1045 eligible siblings, we found four trajectories of positive symptoms: low, moderate, high, and highdecreased and four trajectories of negative symptoms low, moderate, high-decreased and high-increased. In 583 controls, we found three trajectories of positive symptoms: low, moderate, and high-decreased, and three stable trajectories of negative symptoms such as low, moderate, and high. The model accuracy in all group modeling ranged from 70 to 91%. In general, larger variation in trajectories was observed in patients with stability and persistence of symptoms in about two-thirds of patients, reduction of symptoms in more than one-fifth of patients, and worsening symptoms in about one-tenth of patients. Trajectory model fit indices and parameter estimates are presented in Supplementary Tables S1-S8.
Predictors of positive and negative symptoms trajectories. In patients, univariable regression analysis showed that various sociodemographic, functional, cardiometabolic, and clinical factors were significantly associated with positive and negative symptom trajectories in patients (Supplementary Table S9). PRS SCZ (at all p-value thresholds tested) was not significantly associated with positive and negative symptoms trajectory (Supplementary Table S9). Following adjustment for covariables, only baseline positive and negative symptoms were significantly associated with positive (High vs Low: OR 2.77; 95% CI 1. 46-5.26; p = 0.002) and negative symptom trajectories (High-Decreased vs Low: OR 2.15; 95% CI 1.78-2.60; p < 0.0001; and High-Increased vs Low: OR 1.60; 95% CI 1.38-1.86; p < 0.0001) respectively (Table 2). We found similar results in the sensitivity analysis using only core negative symptoms (High-Decreased vs Low: OR 2.08; 95% CI 1.14-3.77; p = 0.016).
In siblings, several sociodemographic, functional, and clinical factors were significantly associated with positive and negative symptom trajectories (Supplementary Table S10). Additionally, PRS SCZ (at p-value thresholds 0.05 and 0.1) was significantly associated with positive symptom trajectories (High vs Low: OR 1.09; 95% CI   (Table 3). In controls, positive and negative symptom trajectories were significantly associated with diverse sociodemographic, functional, and clinical factors (Supplementary Table S11). PRS SCZ (at all p-value thresholds tested) was not significantly associated with positive and negative symptoms trajectory (Supplementary Table S11). In the multivariable model, age (High-decreased vs Low: OR 0.91, 95% CI 0.83-0.99, p = 0.04) and quality of life (High-decreased vs Low: OR 0.04; 95% CI 0.002-0.76; p = 0.03) was found to be a strong predictor of positive symptoms trajectory, whereas premorbid adjustment (High vs Low: OR 7.74; 95% CI 1.01-59.38; p = 0.05) were found to be a strong predictor of negative symptoms trajectory (Table 4).

Discussion
We investigated the long-term trajectories of positive and negative symptoms among patients with schizophreniaspectrum disorder, their unaffected siblings, and healthy controls. We also examined the role of genetic (PRS SCZ ), sociodemographic, functional, cardiometabolic, and clinical factors.
In this study, three to four subgroups of patients, siblings, and controls were identified based on positive and negative symptoms. This is in line with previous studies, as presented in our recent systematic review, that identified two to five latent groups across psychotic symptom dimensions and population groups (i.e., patients, siblings and controls), though the study characteristics and findings (e.g., actual number identified subgroups and percentage distribution in each subgroup) were different for individual studies. The use of different statistical modeling techniques may lead to the identification of a different number of subgroups. For example, a study on the application of five statistical data-driven subtyping methods on longitudinal data showed that the number of trajectory groups derived from one method can be remarkably different from the other method using the same www.nature.com/scientificreports/ data structure 27 . Differences in instruments to measure clinical symptom severity may also affect the number of identified trajectories. We found a significant association between PRS SCZ and the six-year trajectory of positive symptoms in patients and siblings but attenuated after adjustment for sociodemographic, clinical, and functional factors. Earlier studies on the association between schizophrenia phenotypes and PRS SCZ have largely shown inconsistent results, and some are concordant with our findings from the bivariate correlation analyses. A systematic review of PRS-based studies showed that PRS SCZ was significantly associated with the severity of negative symptoms, but not with positive symptoms, in patients and the healthy general population wherein most estimates were based on correlation tests 26 . A population-based birth cohort study showed that PRS SCZ was significantly associated with negative symptoms in adolescence in the general population, while others found no evidence for an association between www.nature.com/scientificreports/ PRS SCZ and psychotic experiences/positive symptoms 20,28 . Other cross-sectional and longitudinal studies 18,25,29,30 found no association between PRS SCZ and positive and/or negative symptoms in patients and healthy individuals. The inconsistencies between studies could be due to the difference in the assessment instruments for positive and negative symptoms, sampling, sample size, genotyping, and quality control methods used in GWAS, and inclusion of patients at vastly different stages of illness and with a diverse spectrum of symptoms. Additionally, PRS is highly dependent on factors, such as the sample characteristics, sample size, stage, and/or severity of the disease that leads to variation in findings across studies 31 . Furthermore, the PRS was derived from a schizophrenia diagnosis, which is expected to represent less than half the participants with negative symptoms due to the schizophrenia illness itself (i.e. primary negative symptoms) and may not be predictive for negative symptoms related to secondary causes (i.e. secondary negative symptoms), such as substance abuse and depression.
In line with an earlier study 15 , baseline positive and negative symptoms significantly predicted positive and negative symptom trajectories in patients, respectively. This finding suggests that patients who had severe symptoms at baseline showed persistence and stability of the initial level of symptoms 32 . On the other hand, an earlier study showed that the severity of positive symptoms initially decreased and became stable over time while negative symptoms showed persistence over time 16 . In siblings, poor premorbid adjustment and quality of life were found to be strong predictors of positive and negative trajectories. In controls, none of the environmental factors survived adjustment for covariables. The lack of association following adjustment for covariables in our study is not surprising given that these factors were selected from previous group-based trajectory modeling studies that mostly performed univariable analyses or just compared trajectories using proportion or mean estimates. In general, at least in the univariable model, multiple factors were found to be strong predictors of positive and negative symptoms trajectories, and this may support the notion that positive and negative symptoms share a similar course [12][13][14] .
We investigated the long-term trajectories of schizophrenia symptoms and their association with a broad range of sociodemographic, clinical, functional, and genetic (PRS SCZ ) factors in family-based cohorts with large samples of people with psychosis, their unaffected relatives, and controls. This eases the comparison of the patterns and offers new perspective on genetic and environmental contributions to the development of trajectories. Interestingly, results on the number of the identified latent subgroups and course over time were comparable across patients, unaffected siblings and healthy controls, and provided a similar trend of evidence, though the predictors and level of significance were different. Furthermore, we ensured the validity and clinical meaningfulness of the identified trajectories by observing heterogeneity across all sample groups, comparing findings from previous studies in our systematic review, and involving clinicians during trajectory modeling. However, the current study has also some limitations. PRS is based on common genetic variants and naturally does not capture copy number variants (CNV), or rare SNP contributions to variance. Besides, trajectory group identification and their association with PRS SCZ may be different if it is done based on other symptom definitions, e.g. total PANSS score, a subdomain of positive (e.g., hallucination, delusion, disorganization), or PRS constructed based GWAS summary statistics of these quantitative phenotypes. Another limitation is related to the use of PANSS to assess negative symptoms. This scale includes aspects that are not conceptualized as negative symptoms, but it evaluates symptoms belonging to the experiential domain (avolition, anhedonia, asociality) only at the behavioral level. There is also no distinction between primary and secondary symptoms in our study, which could have quite different trajectories. Extrapyramidal side effects impact negative symptoms; unfortunately, they were not measured in GROUP and hence were not accounted for. Regarding PANSS positive symptoms, reality distortion (e.g., hallucinations and delusions) is quite distinct from the disorganization of thought. Moreover, the trajectory analyses were not cross-validated in an independent sample or in split-sample.

Conclusions
Three to four trajectories or latent subgroups were identified in patients, relatives, and controls. PRS SCZ did not predict latent subgroups and long-term trajectories of positive and negative symptoms in patients, siblings, and controls. Among non-genetic factors, baseline symptoms severity in patients, and premorbid adjustment, and health-related quality of life in siblings predicted long-term trajectories while none of them were significant in controls. Added to previous knowledge, the longitudinal clinical course of schizophrenia can be distinguished into predictable stable trajectories when non-genetic factors may be sufficient to distinguish latent subgroups and predict their longitudinal course of schizophrenia symptoms. A prior prediction of the best fit corresponding clinical trajectory would guide psychiatrists for choosing of the right patient-tailored intervention when treating first episode schizophrenia. Large-scale longitudinal studies with robust measures of quantitative phenotypes using a harmonized measuring instrument are needed to determine how genetic risk for schizophrenia is expressed and whether this expression changes with time, to examine potential mediators and moderators of risk, and to determine the usefulness of PRS SCZ for prediction of transition to psychosis in siblings and healthy people.

Methods
Study population. Data of a 6-year longitudinal multi-center national study was analyzed comprising 1119 patients, 1059 unaffected siblings, and 586 healthy controls who were eligible at baseline, using the 7th official release of the Genetic Risk and OUtcome of Psychosis (GROUP) cohort data. Patients were included if diagnosed with a schizophrenia spectrum disorder, age range of 16 to 50 years, good command of the Dutch language, and willing and capable of giving written informed consent. Siblings and controls were included if they had no known lifetime psychotic disorder. The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria were used to diagnose schizophrenia spectrum disorder with Comprehensive Assessment of Symptoms and History interview 33 . During the 6 years follow-up, 14 siblings and three controls developed psychosis; therefore, they were included in the patient group in all analyses. Sociodemographic, func- www.nature.com/scientificreports/ tional, cardiometabolic, clinical and genetic data were collected at baseline, and after 3 years and 6 years. The study was approved by the Ethics Committees of the Ethical Review Board of the University Medical Centre Utrecht and subsequently by local review boards of each participating institute, in accordance with the Declaration of Helsinki. Besides, all methods were performed in accordance with the published study protocol 34 . Details on the sample size and power calculation, sample characteristics, and recruitment and assessment procedures presented in the published protocol 34 .

Measurement of variables. Positive and negative symptoms. The Positive and Negative Syndrome Scale
(PANSS) for schizophrenia was administered to measure the severity of positive and negative symptoms in patients 35 . The PANSS is a reliable and valid tool that rates severity of symptoms in an incremental seven-point Likert severity scale (from 1 = none, 2 = minimal, to up to 7 = extreme). The positive symptom subscale score was calculated as the sum of seven positive symptom items, and the negative subscale score was the sum of seven negative symptoms. As part of sensitivity analysis, negative subscale score was also calculated according to recent guidelines on measuring core negative symptoms as a sum of five negative symptoms (excluding previously integrated abstract thinking and stereotyped thinking) [36][37][38] . Positive and negative schizotypy in siblings and healthy controls were assessed with the Structured Interview for schizotypy-revised (SIS-R) [39][40][41] . The SIS-R is a reliable tool and items are rated on an incremental four-point Likert severity scale (from 0 = absent, 1 = mild, 2 = moderate, and 3 = severe). The positive dimension score was calculated as the mean of positive schizotypy (referential thinking, delusional mood, magical ideation, illusions, and suspiciousness), and the negative subscale score was calculated as the mean of negative schizotypy (social isolation, social anxiety, introversion, and restricted affect).
Sociodemographic, functional, and clinical factors. The sociodemographic variables were age, gender, marital status, ethnicity, and educational status (i.e., years of education). Clinical and functional variables were age of onset of illness, duration of illness, number of psychotic episodes, antipsychotic treatment, substance use, premorbid adjustment, estimated IQ, general cognition, psychotic symptoms or psychotic-like experiences, depression, social and global functioning, and quality of life. The variables are discussed below. We also collected cardiometabolic data, such as body mass index, waist circumference, glycated hemoglobin, triglycerides, low-density lipoprotein, high-density lipoprotein, systolic and diastolic blood pressure and pulse rate through physical and laboratory examination only in patients at the third year of follow-up. Cognitive function was assessed using a comprehensive neuropsychological test battery 27,42,43 that included the Word Learning Task (i.e. immediate recall and delayed recall), Continuous Performance Test-HQ (CPT-HQ) (CPT performance index and CPT variability), WAIS-III Digit Symbol Substitution Test, WAIS-III Information, WAIS-III Calculation, and WAIS-III Block Design test. A composite score as a measure of general cognition was generated using principal component analysis. Details on the assessment of these tasks, scoring system and calculation of composite score are published elsewhere 34,44 . The Community Assessment of Psychic Experiences (CAPE) was used to assess psychotic experiences (CAPE-42; www. cape42. homes tead. com) 45,46 . The Calgary Depression Scale for Schizophrenia (CDSS) was used to assess depression 47 . Premorbid adjustment, which is a measure of the degree of achievement of developmental goals at each of several periods of a subject''s life before the onset of schizophrenia, was assessed using the premorbid adjustment scale 48 . The World Health Organization Quality of Life-BREF (WHOQOL-BREF) questionnaire, which has high construct validity and reliability, was used to assess the quality of life 49 ; the Social Functioning Scale (SFS) and Global Assessment of Functioning (GAF) -disability were used to measure social (total score of SFS), occupational (sub-score of SFS) and global functioning 50,51 .
Genotyping, quality control (QC), and polygenic risk score (PRS). Blood samples were obtained at baseline for genotyping. Genotype data for 2,812 individuals was generated on a customized Illumina, IPMCN array with 570,038 single nucleotide polymorphisms (SNPs). This chip contains ~ 250 k common SNPs, 250 K Exome chip variants (rare, exonic, nonsynonymous, MAF < 1%), and ~ 50 K psychiatric-related variants. Quality control (QC) procedures were performed using PLINK v1.9 52 . In total, 2505 individuals and 275,021 SNPs passed QC steps. SNPs were imputed on the Michigan server 53 using the HRC r1.1 2016 reference panel with European samples after phasing with Eagle v2.3. Finally, PRSs for 2505 samples were calculated using schizophrenia-associated alleles and effect sizes reported in the GWAS summary statistics from the Psychiatric genetics consortium (PGC) 2022 54 by excluding Dutch subjects while adjusting for population ancestry estimate (i.e., PCA components). Polygenic risk scores for SCZ were derived from a European-ancestry study 1 and generated by applying a Bayesian framework method that uses continuous shrinkage (cs) on SNP effect sizes. PRS-cs-auto is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local linkage disequilibrium patterns as described in detail elsewhere. PRSs were analyzed for four schizophrenia GWAS p-value thresholds of 5 × 10 -8 , 0.05, 0.1, 0.5 and 1.0. Each PRS separately modeled to compare results and identify the most predictive and discriminant PRS for observed trajectories. See the Supplementary Methods for detailed information. The top three principal components were used to correct genetic ancestry effect.
Statistical analysis. Differences in patient, sibling and control sociodemographic, clinical and functional characteristics at baseline were explored using a linear mixed-effects model for continuous variables and Pearson's Chi-square tests for categorical variables. Genetic profile (PRS SCZ ) was also compared across samples. Since individuals are clustered in families, observations from the same family are correlated. Therefore, we declared family as a random effect to set up a common correlation among all observations from the same family in the regression modeling. Family as a random effect was ignored when the model poorly fitting (i.e., G-matrix was not positive definite). The Maximum Likelihood (ML) method was used to estimate the model parameters and www.nature.com/scientificreports/ fixed-effects (i.e., type III overall group comparison tests) model results were interpreted. The bivariate association between PRS SCZ and baseline positive and negative symptoms was also examined using the Spearman's correlations test. Group-based trajectory modeling (GBTM) using PROC TRAJ was applied to identify latent homogeneous subgroups that have a unique symptom trajectory 55 . The modeling process is explained in detail in our earlier publication 44 and in the Supplementary Methods. A trajectory is defined as a group of individuals who have a symptom profile that is homogenous within the group but significantly heterogeneous between groups 56 . We used the terms trajectory and latent subgroup interchangeably. The multinomial random-effects logistic regression model was used because symptomatic trajectories had more than two nominal categories and study participants were clustered within a family. Besides, all analyses used full information ML estimation, which uses all data, including partial cases, to arrive at unbiased parameter estimates. Univariable and multivariable multinomial random-effects logistic regression models 57 using PROC NLMIXED were fitted using symptomatic trajectories as an outcome variable, and PRS SCZ and sociodemographic, functional, clinical and cardiometabolic factors as predictors. PROC NLMIXED maximizes the likelihood function of the multinomial random-effects model by the Adaptive Gaussian quadrature method and Dual Quasi-Newton optimization technique, and therefore, provides stable parameter estimates. PROC LOGISTIC was used to estimate initial parameters (β) that were used in the random-effects logistic regression model and to calculate explained variance (i.e., Nagelkerke R 2 ). In the multivariate model, all significant variables at alpha level ≤ 0.025 in the univariable model were included. Variables not collected at baseline assessment (i.e., depressive symptoms, social and occupational functioning, and physical health parameters) were not included to avoid data separation points and to increase model convergence. All tests were adjusted for multiple comparisons by Bonferroni correction; therefore, the significant threshold was set to be 0.025 (i.e., 0.05 divided the number of comparisons). We made two comparisons across all samples with one group used as a reference and two subgroups/trajectories were merged in siblings. The PRS SCZ at p-value threshold 0.05 (i.e., considered as the most predictive) was included in the multivariable model, even when its effect was not significant in the univariable model because only we have PRS SCZ to assess genetic susceptibility. All the analyses were done using R 3.6.0 and SAS 9.3 software.

Data availability
The data that support the findings of this study are available from the GROUP cohort principal investigators (B.Z.A. and R.B.) on reasonable request. The data are not publicly available due to containing information that could compromise research participant privacy or consent.